Dataset summary: Dengue - grouped by patient

Report generated using dataprep.

Dengue dataset report

Overview

Dataset Statistics

Number of Variables 14
Number of Rows 15036
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 32
Duplicate Rows (%) 0.2%
Total Size in Memory 3.4 MB
Average Row Size in Memory 235.3 B

Variable Types

Categorical 9
Numerical 5

Variables

dsource

categorical

Distinct Count 10
Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 3.1392
Standard Deviation 0.995
Median 4
Minimum 2
Maximum 5

Sample

1st row 01nva
2nd row 01nva
3rd row 01nva
4th row 01nva
5th row 01nva

Letter

Count 30071
Lowercase Letter 30071
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 17130

age

numerical

Distinct Count 53
Unique (%) 0.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.1 MB
Mean 8.4057
Minimum 0
Maximum 18
Zeros 4
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 0
5-th Percentile 2
Q1 5
Median 9
Q3 12
95-th Percentile 14
Maximum 18
Range 18
IQR 7

Descriptive Statistics

Mean 8.4057
Standard Deviation 3.9748
Variance 15.7993
Sum 126388.53
Skewness -0.06325
Kurtosis -0.8416
Coefficient of Variation 0.4729

gender

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.867
Standard Deviation 0.9911
Median 4
Minimum 4
Maximum 6

Sample

1st row Male
2nd row Female
3rd row Female
4th row Male
5th row Female

Letter

Count 73180
Lowercase Letter 58144
Space Separator 0
Uppercase Letter 15036
Dash Punctuation 0
Decimal Number 0

weight

numerical

Distinct Count 355
Unique (%) 2.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.1 MB
Mean 28.7932
Minimum 7.2
Maximum 114
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 7.2
5-th Percentile 12
Q1 19
Median 26.5
Q3 37
95-th Percentile 52
Maximum 114
Range 106.8
IQR 18

Descriptive Statistics

Mean 28.7932
Standard Deviation 12.8574
Variance 165.3136
Sum 432935.1
Skewness 0.8498
Kurtosis 0.8271
Coefficient of Variation 0.4465

bleeding

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.7447
Standard Deviation 0.4361
Median 5
Minimum 4
Maximum 5

Sample

1st row True
2nd row False
3rd row True
4th row False
5th row False

Letter

Count 71341
Lowercase Letter 56305
Space Separator 0
Uppercase Letter 15036
Dash Punctuation 0
Decimal Number 0

plt

numerical

Distinct Count 1453
Unique (%) 9.7%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.1 MB
Mean 1645.5563
Minimum 3
Maximum 152152
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 3
5-th Percentile 24
Q1 74
Median 175
Q3 251
95-th Percentile 405.25
Maximum 152152
Range 152149
IQR 177

Descriptive Statistics

Mean 1645.5563
Standard Deviation 8719.5556
Variance 7.6031e+07
Sum 2.4743e+07
Skewness 7.1895
Kurtosis 60.1581
Coefficient of Variation 5.2988

shock

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.9524
Standard Deviation 0.213
Median 5
Minimum 4
Maximum 5

Sample

1st row True
2nd row True
3rd row True
4th row True
5th row True

Letter

Count 74464
Lowercase Letter 59428
Space Separator 0
Uppercase Letter 15036
Dash Punctuation 0
Decimal Number 0

haematocrit_percent

numerical

Distinct Count 564
Unique (%) 3.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.1 MB
Mean 41.4304
Minimum 21
Maximum 67.05
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 21
5-th Percentile 33.6
Q1 37.3
Median 40.5
Q3 45
95-th Percentile 52
Maximum 67.05
Range 46.05
IQR 7.7

Descriptive Statistics

Mean 41.4304
Standard Deviation 5.6391
Variance 31.7989
Sum 622947.5593
Skewness 0.6076
Kurtosis 0.1263
Coefficient of Variation 0.1361

bleeding_gum

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.8942
Standard Deviation 0.3076
Median 5
Minimum 4
Maximum 5

Sample

1st row True
2nd row False
3rd row True
4th row False
5th row False

Letter

Count 73589
Lowercase Letter 58553
Space Separator 0
Uppercase Letter 15036
Dash Punctuation 0
Decimal Number 0

abdominal_pain

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.6887
Standard Deviation 0.463
Median 5
Minimum 4
Maximum 5

Sample

1st row True
2nd row True
3rd row True
4th row True
5th row True

Letter

Count 70500
Lowercase Letter 55464
Space Separator 0
Uppercase Letter 15036
Dash Punctuation 0
Decimal Number 0

ascites

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.845
Standard Deviation 0.3619
Median 5
Minimum 4
Maximum 5

Sample

1st row False
2nd row False
3rd row False
4th row False
5th row False

Letter

Count 72849
Lowercase Letter 57813
Space Separator 0
Uppercase Letter 15036
Dash Punctuation 0
Decimal Number 0

bleeding_mucosal

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.8212
Standard Deviation 0.3832
Median 5
Minimum 4
Maximum 5

Sample

1st row False
2nd row False
3rd row True
4th row False
5th row False

Letter

Count 72492
Lowercase Letter 57456
Space Separator 0
Uppercase Letter 15036
Dash Punctuation 0
Decimal Number 0

bleeding_skin

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.523
Standard Deviation 0.4995
Median 5
Minimum 4
Maximum 5

Sample

1st row False
2nd row False
3rd row True
4th row False
5th row False

Letter

Count 68008
Lowercase Letter 52972
Space Separator 0
Uppercase Letter 15036
Dash Punctuation 0
Decimal Number 0

body_temperature

numerical

Distinct Count 1220
Unique (%) 8.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.1 MB
Mean 37.8766
Minimum 35
Maximum 41.5
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 35
5-th Percentile 37
Q1 37.2143
Median 37.6333
Q3 38.5
95-th Percentile 39.5
Maximum 41.5
Range 6.5
IQR 1.2857

Descriptive Statistics

Mean 37.8766
Standard Deviation 0.8523
Variance 0.7265
Sum 569512.4822
Skewness 0.8577
Kurtosis 0.04737
Coefficient of Variation 0.0225

Interactions

Correlations

Missing Values



 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
 import pandas as pd
 import numpy as np
 from dataprep.eda import create_report
 from pkgname.utils.data_loader import load_dengue
 from pkgname.utils.print_utils import suppress_stdout, suppress_stderr

 features = ["dsource", "age", "gender", "weight", "bleeding", "plt",
             "shock", "haematocrit_percent", "bleeding_gum", "abdominal_pain",
             "ascites", "bleeding_mucosal", "bleeding_skin", "body_temperature"]

 with suppress_stdout() and suppress_stderr():

     df = load_dengue(usecols=['study_no']+features)

     for feat in features:
         df[feat] = df.groupby('study_no')[feat].ffill().bfill()

     df = df.loc[df['age'] <= 18]
     df = df.dropna()

     df = df.groupby(by="study_no", dropna=False).agg(
         dsource=pd.NamedAgg(column="dsource", aggfunc="last"),
         age=pd.NamedAgg(column="age", aggfunc="max"),
         gender=pd.NamedAgg(column="gender", aggfunc="first"),
         weight=pd.NamedAgg(column="weight", aggfunc=np.mean),
         bleeding=pd.NamedAgg(column="bleeding", aggfunc="max"),
         plt=pd.NamedAgg(column="plt", aggfunc="min"),
         shock=pd.NamedAgg(column="shock", aggfunc="max"),
         haematocrit_percent=pd.NamedAgg(column="haematocrit_percent", aggfunc="max"),
         bleeding_gum=pd.NamedAgg(column="bleeding_gum", aggfunc="max"),
         abdominal_pain=pd.NamedAgg(column="abdominal_pain", aggfunc="max"),
         ascites=pd.NamedAgg(column="ascites", aggfunc="max"),
         bleeding_mucosal=pd.NamedAgg(column="bleeding_mucosal", aggfunc="max"),
         bleeding_skin=pd.NamedAgg(column="bleeding_skin", aggfunc="max"),
         body_temperature=pd.NamedAgg(column="body_temperature", aggfunc=np.mean),
     ).dropna()

     report = create_report(df, title="Dengue dataset report")

 report

Total running time of the script: ( 0 minutes 5.352 seconds)

Gallery generated by Sphinx-Gallery